Add online trade-outcome ML model and integrate into AdaptiveSelector (AI decisioning)#4
Add online trade-outcome ML model and integrate into AdaptiveSelector (AI decisioning)#4
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fc10cafff9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| model = self._get_outcome_model(regime, strat) | ||
| label = 1 if pnl_pct > 0 else 0 | ||
| weight = min(2.0, max(0.25, abs(pnl_pct) + 0.25)) | ||
| model.update(pending_state, label, sample_weight=weight) |
There was a problem hiding this comment.
Gate outcome-model training on executed trades only
This path updates OnlineTradeOutcomeModel for any non-unknown strategy, but report_result(0, 0) is also called when RiskManager rejects a signal and no order is placed (darwin_agent/core/agent_v2.py lines 341 and 391). In that common rejection flow, the code still assigns label = 0 and trains the model as if a real losing trade occurred, so operational constraints (max positions, daily limits, etc.) get baked into ml_win_prob as false negatives and can systematically skew future confidence blending.
Useful? React with 👍 / 👎.
| next_state=next_state, done=done, | ||
| metadata={"regime": regime})) | ||
| self._update(state, action, reward, next_state, done, regime) | ||
| self._remember_pattern(state, action, reward, regime) |
There was a problem hiding this comment.
Skip pattern-memory updates for non-filled decisions
The new retrieval memory is updated on every learn() call, including report_result(0, 0) invocations used for rejected (unfilled) trades. Because reward shaping adds a positive baseline (calculate_reward), these rejected decisions can be recorded as wins in _remember_pattern, which then feeds back into _pattern_bias and reinforces actions that were never actually executed, corrupting the memory signal used during action selection.
Useful? React with 👍 / 👎.
Motivation
Description
OnlineTradeOutcomeModelindarwin_agent/ml/outcome_model.pywith robust feature sanitization, weighted online updates,export()andfrom_dict()persistence.AdaptiveSelector(darwin_agent/ml/selector.py) sodecide()queriesml_win_prob, blends it intosignal.confidence, includes it in the AI reason text, and stores model metadata in the pending trade.report_result()now updates the corresponding outcome model with a sign label and magnitude-weighted sample weight; the selector also persists/restores outcome models viaexport_for_dna()/import_from_dna().darwin_agent/ml/brain.pygained explainability and lightweight pattern-memory hooks used in action explanations, andREADME.mdwas updated to mention the new online trade-outcome ML capability.Testing
python -m compileall darwin_agentwhich completed successfully.decide(),report_result(),export_for_dna()andimport_from_dna()and observed no runtime errors (decision printed, models loaded).predict_proba(), ran multipleupdate()calls, and verified probability moved andexport()/from_dict()preserved state.Codex Task